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(57) ABSTRACT 

A method is provided for detecting computer viruses that 
infect text-based files. In accordance with a preferred 
embodiment, a collection of virus signatures reflecting 
sequences of characters or instructions known to be found in 
such viruses is maintained on a computer system. A virus 
detection program is also maintained for the purpose of 
comparing the contents of computer files to the vims sig- 
natures. Upon execution of the virus detection program, 
whitespace within text-based files is transformed such that 
each sequence of whitespace characters is replaced by a 
single whitespace character. Virus signatures of viruses 
known to infect text files are similarly transformed. A 
transformed text-based file is then searched for at least one 
of said virus signatures. The user is alerted to a possible 
virus infection if any of the virus signatures are found in a 
file. In another preferred embodiment, an additional collec- 
tion of at least one vims signature containing sequences of 
characters or instmctions known to be found in viruses that 
infect executable computer files is maintained on the com- 
puter system. A transformed text -based file is searched for at 
least one of the additional vims signature, which are not 
transformed before the search. 

23 Claims, 2 Drawing Sheets 
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METHOD OF TREATING WHITESPACE When one known version of mlRC is started by a user, a 

DURING VIRUS DETECTION script file named SCRIPT.INI is executed. One command 

that may be included in SCRIPXINI places the user's 

FIELD OF THE INVENTION computer into a file transfer mode. This mode, which can be 

„ . 5 turned on and off, allows remote users in the same chat room 

niis invenuon relates to the field of compuleis and ,^ ^3^^.,^ ^ ,^ ^ ^^^^y ^^j^^^^j l^, , 

computer networks. In particular, the present invention ^^^.^ computer system and to retrieve files residing on 

relates to the treatment of whitespace while searching com- ^j^^g^ ^^^j^ ^ be beneficial in the 

puter files for a computer virus. ^^^^^^ of information between users, but, if it is included in 

BACKGROUND OF THE INVENTIGN " '^"'s SCRIPT.INI without the user's knowledge the 

contents of his or her computer system become vulnerable to 

A computer varus can be defined as a sequence of com- pilferage, 
mands or instructions that interfere with a user's operation Another command that may be executed in SCRIPT.INI 
of, or cause damage to, his or her computer system. Com- causes the user's SCRIPT.INI file to be automatically trans- 
puter viruses may damage a computer system directly, such 35 mitted to the computer system of each person who joins the 
as by deleting files or formatting a disk, or indirectly, such user's chat room. Upon receipt of the file, the remote user's 
as by altering the system's protective measures and thus existing SCRIPT.INI file may be overwritten with the 
making the computer vulnerable to probing or other attacks. received version. If the transferred SCRIPT.INI file also 

Computer viruses therefore present a significant threat to enables file transfer mode (as described above), the remote 

the integrity and reliability of computer systems and will ^0 user's computer system will, unknown to the user, become 

continue to present such a threat due to the trend toward vulnerable the next time the script file is run. 

interconnection of computers. The increase in computer- to- These two "features" of mIRC are, in combination, some- 

computer communications, via the internet for example, has times termed the "mIRC virus." The virus propagates like a 

caused a commensurate increase in the spread of viruses worm (i.e., it copies the entire file as opposed to simply 

because infected files are spread more easily and rapidly inserting viral code into an uninfected file) and exposes a 

than ever before. user's computer system to probing and file theft. 

Vitus detection is thus an essential element in the effective Text files such as the script files used by mIRC contain 

maintenance of computer systems. In order to detect a various character and formatting codes which merely alter 

computer virus, a virus detection program is generally the appearance of the file and/or its output, but which have 

employed in conjunction with a series of virus "profiles" or no effect upon the execution of script or batch commands 

"signatures" which represent characteristics or patterns of within the file. For example, when individual commands 

known viruses. One type of virus detection routine monitors within SCRIPT.INI are executed, individual words may be 

a program suspected of being infected by a virus. The separated by one space character, two spaces, a dozen 

program's behavior is compared to a profile of operating spaces, a line feed, a tab character, etc. These are generally 

characteristics of a known virus and, if a match is found, the known as "whitespace" because they are invisible characters 

program is assumed to contain a virus. that merely serve to separate visible, printable, characters. 

While virus creators once focused on binary executable When a text file is edited, its whitespace is often refor- 

computer files (e.g., those with .EXE or .COM file matted or rearranged in order to yield a particular textual 

extensions), they have broadened their horizons to target, for appearance. The resulting text file may contain the identical 

example, macros (such as those executed by word process- sequence of printable characters as a known virus, but have 

ing or spreadsheet programs) and even text-based files (e.g., as little as one difference in the whitespace dividing the 

word processing files, ASCII text files, etc). While many text characters of that sequence. Further, multiple text files 

files are unsuitable for performing malicious actions on infected with the same virus do not always manifest the virus 

behalf of a virus creator, others, such as batch and script 45 in identical forms. For example, one text file may have been 

files, contain instructions that are executed in conjunction edited subsequent to its infection, thus altering the appear- 

with binary executable programs. ance of the resident vims (including whitespace within the 

By way of illustration, mIRC is an internet relay chat virus). Although still capable of performing its intended 

program that allows multiple computer users, using com- task, the textual appearance of the virus in the one file is 

puters remote from each other, to "converse" via the inter- 50 different from its appearance in a second, unmodified, 

net. A communication channel, or "chat room," is estab- infected text file. As a result, when both infected text files are 

lished by a user wishing to discuss a topic. Within a chat searched for a specific pattern or sequence of commands 

room, a user at one computer types messages that are representing the virus in its unmodified form, an infected file 

received and displayed on the screen of the other users in the will not necessarily be identified. In other words, a viral 

same chat room. Users can come and go from conversations, 55 signature that has been modified will not be detected by a 

establish private communication channels, etc. virus detection program and the user will unknowingly 

Upon its invocation, and during its execution, mIRC continue to use an infected file, 

automatically invokes a number of script files to perform With viruses that cause indirect damage, such as the 

various functions. For example, EVENTS.INI contains ml RC virus, the user's computer may be exposed to probing 

instructions that mIRC applies in response to certain mes- 60 attacks for an extended period of time before the user 

sages or events (e.g., a particular user joins the conversation, becomes aware of and purges the virus. Because the user is 

a conversant uses a specified word or phrase, etc.). Another unlikely to notice any direct, obvious damage caused by the 

script file, COMMANDS.INI, lists shortcut commands a virus (e.g., deleted files, formatted disks), there is nothing to 

user may employ. If, for example, the user frequently sends alert the user to the infection. 

a particular message or response, he or she may create a 65 As a related problem, some virus detection programs 

short command (similar to a macro) which, when entered, is falsely report the presence of a virus in a text file that merely 

translated by mIRC into the longer message or response. describes or refers to a known virus. For example, a text or 
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word processing file may contain at least one textual Computer system 10 includes a central processing unit 

extract — such as messages or other viral indicators that have ("CPU") 12, a memory unit 14, at least one storage device 

been known to appear on the display of an infected computer 16, input device 18, a display device 20, a communication 

system — from viruses known to infect executable computer interface 22, and a printer 24. A system bus 26 is provided 
files. The extracts may be included in the text file for 5 for communicating between the above elements. 
informationaJ purposes, such as to educate users as to known Storage device 16 illustratively includes at least one 

virus symptoms. When a virus detection program searches removable or fixed disk drive, compact disc, DVD, or tape, 

computer files for viruses by using indicia such as these I°P^» ^^^ice 18 is a keyboard, mouse, or other similar 

extracts, the program may erroneously report that the text or ^^^vice. Display device 20 iUustratively is a computer 
word processing file contains a virus. lO ^j^Pl^^' ^""^ CRT monitor, LED display or LCD 

. ^ n r display. Communication mterface 22 may be a modem, a 

There is, therefore, a need m the art for a method of * i • . r .u * . i i . • 

J , \ ^. J • . . ^ ill ji network mterface, or other connection to external electronic 

detecting a text-based virus m a text file regardless of how , „, .n-.'^^- uj 

..f ... . . J nf • r .J devices, such as a senal or parallel port. Pnnter 24 IS a hard 

the whitespace within the virus and the file is formatted. * * j • u i • * j * 

™ ./ .t-jfj - X. c copy output device such as a laser pnnter, dot matrix printer, 

There is also a need for a method of reducing the frequency or plotter 
with which virus detection programs falsely identify text- ^.^^ 

based files as being infected. , Storage devices 16 contam a vims detection program 36 

(e.g., a search engine) and a file containmg at least one vuiis 

SUMMARY OF THE INVENTION signature 38. Virus signatures 38 are sequences of computer- 

, , r J i_ J- . ..J- readable characters that portray viruses found within textual 

In accordance with a preferred embodiment, a method is . . , » ri ■ .u . «l * l 

... r . t • ... -in and/or executable computer files in that they match the 

provided to umformly transform whitespace within a text- ''^ . u • u u * -lu • p u . / j -^u- 

f , . . ... r behavior exhibited by, or a series of characters found within, 

based computer file so that each combmation of non- , i/- j . iz: • 

^ , ..../-,. ... known vunises. Virus detection program 36 comprises 

whitespace characters withm the file is separated by the ^ ^ i_i • * *• i. • i. i_ * j i_ 

^ , r . , 1 . . 1 computer-readable instructions which, when executed by 

same code, preterably a whitespace character or characters. r>nij uf • -^u- * iit * 

^ ^ , CPU 12, search for viruses within computer files on storage 

In this embodiment, on a computer system having at least -i^j/ \r u 

^, ^. j devices 16 and/or memory unit 14. Viruses in these corn- 

one computer file, a sequence of virus detection mstructions , a^ -J J i_ *i. J . *• r . 11 * i t. 

. J ^ r*. , r 1 puter files are identified by the detection or tell-tale charac- 

is mamtamed tor searching the files for at least one computer f... ... ^, r* * *, lo 

. . f , . t . tenstics which match one of virus signatures 38. 

viruse. A collection or virus signatures comprising , . . . ^, 

computer-readable codes that are known or that are likely to ^'"^ '"^/f ""f ^6 operates by opemng files on 

be found in an infected file, or in a virus capable of infecting =°"P"'«f ?^ ^^^\img each one for at least one virus 
a file, is also maintained. 30 signature 38. One efficacious program for searchmg com- 

„ . , . puter files for virus signatures is VinisScan™, a leading 

Pnor to, or in conjunction with, searchmg a text compiiter ^^^-^^^ application produced by Network Associates, Inc., 

file on the computer system for a virus that mfects text files, ^^^^^j ^^^^ ^^^^^ Associates. VmisScan- is a 

whitespace (i.e space, tab, Ime feed, etc.) witlun the file is ^^^^^^ application offered for sale in a variety of fomis by 

transformed. Advantageously, each sequence of whitespace ^ ^^^^^^ ^^^^^^^ VirusScan™ is accompanied by docu- 
characters is replaced by a common whitespace sequence, i^^ed form (see, e.g., " VirusScan Quick Start 

Illustratively a smgle space. A virus sign^^^^^ Guide", McAfee Associates 1997, accompanying the 

virus known to infect text files is similarly transformed, CD-ROM version of VinisScan for Windows 95, Windows 

The virus detection instructions are then executed to nT^ Windows 3.1x, DOS and OS/2), in computer-readable 

compare a transformed virus signature to the contents of the form (see, e.g., the directory \MANUALS on the CD-ROM 

transformed text file. Detection of a virus signature within version of VirusScan for Windows 95, Windows NT, Win- 

the file indicates that the file is infected with the associated jows 3.1x, DOS and OS/2) and on the World Wide Web at 

virus. A user is alerted if a file is determined to be infected. http://www.nai.com. The contents of these documents are 

In another preferred embodiment, the transformed text file hereby incorporated by reference into the present applica- 
is also searched for a virus signature associated with a virus 45 tion. Other information related to VirusScan'^" may be found 

that infects executable files. In this embodiment, the virus in U.S. patent application Ser. No. 09/001,611, filed Dec. 31. 

signature is not transformed before being compared to the 1997, the disclosure of which is hereby incorporated by 

file contents. reference into the present application. 

BRIEF DESCRIPTION OF TIIE DRAWINGS form, the VirusScan'^" application is adapted in 

50 accordance with the present invention for use on a user's 
These and other features and advantages of the preferred client computer running on a Windows 95™ platform. A 
embodiments will become more readily apparent from the primary routine used by this antivirus application is 
following detailed description, which should be read in "SCAN.EXE." In general, the program SCAN.EXE oper- 
conjunction with the accompanying drawings, in which: ates by comparing the contents of a file with at least one 
FIG. 1 is a block diagram of a representative computer 55 known vims signature to determine if there is match. In 
system; and accordance with the present invention, the program SCAN- 
FIG. 2 is a flowchart demonstrating a method of treating EXE has been adapted to serve as virxis detection program 
whitespace in accordance with a preferred embodiment. 36 and to more effectively search for text-based viruses. 

Further, SCAN.EXE has been adapted to decrease or elimi- 

DETAILED DESCRIE*TION p^^g (j^g erroneous detection of viruses within text or word 

Referring to FIG. 1, there is shown a representative processi ng files. Finally, SCAN. EXE retains its forme rcapa- 

computer system in which a method in accordance with a bilily of scanning executable files for viruses. In a typical 

preferred embodiment may be implemented. Computer sys- configuration, SCAN.EXE draws upon at least one of the 

tem 10 illustratively incorporates an IBM -compatible per- vims signamre file, herein represented by the file name 
sonal computer, but one skilled in the art will understand that 65 SCAN.DAT. 

computer system 10 is not limited to a particular size, class In accordance with a preferred embodiment, SCAN.EXE 

or model of computer. is modified to process text files on computer 10 prior to, or 
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in conjunction with, being searched for virus signatures 38. 
As modified, SCAN. EXE transfornis a text file's 
"whitespace." As used herein, "whitespace" refers to a set of 
whitespace characters or whitespace sequences that may be 
found in a computer file. A "whitespace sequence" refers to 
a sequence of at least one whitespace character, and 
"whitespace character" refers to a non-printable or invisible 
character that may be used for formatting or control 
purposes, illustratively including any or all of the following: 
space, backspace, tab, vertical tab, line feed, form feed, and 
carriage return. For example, in IBM-compatible personal 
computers the whitespace characters are the decimal ASCII 
character codes 8-13 and 32. In contrast, printable charac- 
ters illustratively include alphanumeric characters (e.g., 
those with decimal ASCII character codes in the range 
48-57, 65-90 and 97-122) as well as punctuation marks and 
typographic symbols (e.g., decimal ASCII character codes 
33-^7, 58-64, 91-96 and 123-126). 

In particular, SCAN, EXE performs a whitespace trans- 
formation on the text file by replacing each of the various 
whitespace sequences found in the text file with a common 
whitespace sequence, e.g. a single whitespace character such 
as a space. All whitespace sequences within text-based files 
are thus transformed by SCAN.EXE to common, uniform, 
representations. The result of this transformation is text files 
in which words and other series of visible, printable, char- 
acters are separated only by a single, known, character. 
Therefore, when the transformed text file is to be searched, 
the search procedure need not be concerned with the myriad 
possible whitespace sequences that may have been found in 
the original file. This is advantageous because users may edit 
an infected text file before it is searched, and thereby modify 
whatever whitespace was originally included in the virus. 
Because of such user modifications, searching for a text 
virus based on a profile or signature including anything more 
than the basic whitespace formatting provided by the present 
invention will likely fail to find the virus in infected files that 
were edited. 

Prior to being compared to the contents of the transformed 



transforming the text file, including any internal references 
to or lists of virus profiles and signatures, but not transform- 
ing the signature of an executable virus before a match is 
attempted, it is unlikely that a match will be found. As a 

5 result, false detections of the executable virus within text- 
based files will be minimized. 

Finally, when searching an executable file for virus sig- 
natures in accordance with a preferred embodiment, there is 
generally no transformation associated with the file or 

30 signatures in the signature file. Since executable files are not 
generally edited by users, there is generally no need to 
accommodate various whitespace formats. The whitespace 
configuration reflected in any file infected by an executable 
file virus will most likely match the whitespace configura- 

35 tion of the virus signature. 

Thus, in a preferred embodiment, the following matrix 
identifies when to transform a file being searched for a virus 
or the virus signature representing the virus for which the file 
is being searched. 

20 



When 



25 
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For text virus 


For executable virus 


Text files 


Transform text file 


Transform text file. 




and virus signature 


but not virus 


Executable 


Transform virus 


signature 

Do not transform file 


files 


signature. Do not 
transform file. 


or virus signature 
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FIG. 2 is a flowchart demonstrating a method of treating 
whitespace in accordance with a preferred embodiment. In 
the illustrated method the virus detection procedure (e.g., the 
virus detection program SCAN.EXE) is invoked "on 
demand" by a system user. It is understood, however, that 
this method is easily modified for execution in response to 
a specified event (e.g., booting or shutting down computer 
system 10) or at a specified time (e.g., every night at a 
text file, a virus signature that represents a text-based virus 40 pre-scheduled time). SCAN.EXE can also be configured to 

search all or a subset of files on computer system 10, A user 
may choose to search files on all or a subset of storage 
devices and memory units and may choose to search only 
particular types of files (e.g., executable, text-based). 

In step 50 SCAN.EXE, as modified with instructions 
capable of transforming text files, is installed on computer 
system 10 along with SCAN.DAT, which includes at least 
one virus signature. The virus signatures incorporated in 
SCAN.DAT represent virus behavior or sequences of char- 
50 acters derived from known and/or suspected viruses. The 
whitespace within virus signatures pertaining to text-based 
viruses is transformed, as discussed above, before such virus 
signatures are added to SCAN.DAT. 

In step 52 a user invokes SCAN.EXE to search at least 
that are known to attack executable files (e.g., those with 55 one file on computer system 10 for computer vimses. 
,EXE or .COM extensions). As described below, by trans- SCAN.EXE opens (step 54) a firet file and determines (step 
forming the text file when searching for executable file 56) whether the file is an executable file (such as an 
viruses, the frequency with which false detections occur is executable program's object code) or a text file (such as 
decreased. script, batch, data and word processing files). At step 56, 

In particular, in some instances, a text file may simply 60 SCAN.EXE illustratively examines the first 100 characters 
report or list a known virus profile or signature (e.g., a of the file. As long as at least approximately 90% of them are 
message printed on the display of an infected computer printable characters, the file is considered a text file. For 
system) that is associated with a virus that attacks executable purposes of the presently illustrated embodiment, printable 
files. In such a case, the text file is not actually infected but characters may include any or aU whitespace characters (as 
a comparison of the text file with at least one virus signature 65 described above), alphanumeric characters, punctuation 
would be likely to yield a match and an incorrect indication marks, and typographic symbols. Illustratively, the ASCII 
that the text file was infected with a virus. However, by character set comprising the decimal ranges of 8-13 and 



is also subjected to the same whitespace transformation 
applied to the text-based file. Thus, in a preferred 
embodiment, each whitespace sequence within the 
computer-readable characters of the virus signature is trans- 
formed to a single whitespace character, and this character 45 
is the same as the whitespace character inserted in the 
transformed text file. By uniformly transforming all 
whitespace sequences within both the virus signature and a 
file to be searched, a virus in an infected file is much more 
likely to be located. 

In the presently described preferred embodiment, the file 
being searched is only transformed if it is a text file. It is, 
however, transformed not only when being searched for 
text-based viruses, but also when being searched for viruses 
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32-126 are considered printable characters. One skilled in 
the art will understand that a wider range of characters may 
be considered printable without exceeding the scope of the 
preferred embodiments. 

If determined to be a text -based file, the whitespace within 5 
the file is transformed (step 58) as described above. 
Subsequently, a virus signature from SCAN.DAT is selected 
(step 60) for comparison with the contents of either the 
executable file or the transformed text file. 

If a text file is being searched for a virus that targets text 
files, the vims signature will already have been similarly 
transformed (e.g., prior to being added to SCAN.DAT). In 
particular, each whitespace sequence within the computer- 
readable characters of the virus signature will have been 
transformed to the same whitespace character, and that 
character will be identical to the character to which the text 
file whitespace is transformed. 

Vitus signatures relating to text-based viruses are illus- 
tratively identified as such at the time they are added to 
SCAN.D AT. Advantageously, flags in SCAN.DAT are set to 
indicate the type of virus that the virus signature represents 
and/or the type or types of files that the associated virus 
infects (e.g., executable, text-based), 'llius, when added to 
SCAN.DAT in the presently illustrated preferred 
embodiment, text-based virus signatures that are to be 
compared to text files are transformed and segregated from 
virus signatures that are to be compared to executable files. 

In another mode of operation, however, the original 
format of whitespace within the text-based virus signatures 
added to SCAN.DAT is left intact. In this mode of operation, 
then, the whitespace of such text -based virus signatures is 
transformed after the signature is selected to be compared to 
the contents of the executable or transformed text file. 

The file, whether textual in nature or executable, is then 35 
searched (step 62) for the selected virus signature. If the 
virus signature is found (step 64) within the file, thus 
indicating the file is infected, a user is alerted (step 66). 

If the virus signature is not found within the file (step 64), 
SCAN.EXE determines (step 68) whether the open file is to 40 
be searched for another virus signature. If the open file is to 
be searched for another signature, the illustrated method 
returns to step 60. Otherwise, SCAN.EXE determines (step 
70) whether another file on computer system is to be 
searched. If not, the program exits; otherwise, SCAN.EXE 45 
resumes at step 54. 

Various preferred embodiments have been described. The 
descriptions are intended to be illustrative, not limiting. 
Thus, it will be apparent to those skilled in the art that 
modifications may be made to the invention as described 50 
without departing from the scope of the claims set out below. 
For example, while preferred embodiments have been 
described in terms of transforming each whitespace 
sequence to a single whitespace character, it will be under- 
stood that other transformation procedures can be used. 55 
Generally speaking, methods of whitespace handling in 
accordance with the preferred embodiments are applicable 
wherever whitespace sequences between successive blocks 
of text are converted according to similar rules in both text 
files and the virus signatures associated with viruses that 60 
infect text files. A particularly advantageous rule is that all 
whitespace sequences, regardless of length or of the specific 
whitespace character content, are converted to the same code 
which comprises a whitespace character or characters. 

One of skill in the art will also understand that text file 65 
whitespace sequences may instead be converted into other, 
non-whitespaoe, characters. For example, a visible, printable 
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character or characters may be used to replace whitespace 
sequences between successive blocks of text in a particular 
word processing environment. In addition, there may be 
instances in which no transformation of the virus signature 
is necessary. For instance, the virus signature may have 
previously been transformed into a sequence in accordance 
with a whitespace transformation rule. In such case the 
original virus signature can be stored in a compressed 
format. 
What is claimed is: 

1. A method of searching a text -based computer file for a 
computer vims known to infect text-based files using a 
stored sequence of computer- readable characters associated 
with the computer virus, comprising the steps of: 

transforming whitespace within the text-based file in 
accordance with a whitespace transformation mle to 
form a transformed text-based file; 

transforming whitespace within the stored sequence of 
computer-readable characters in accordance with said 
whitespace transformation mle to form a transformed 
sequence of computer-readable characters; and 

searching said transformed text -based file for at least one 
occurrence of said transformed sequence of computer- 
readable characters, wherein the computer virus is 
detected upon an identification of at least one such 
occurrence. 

2. The method of claim 1, said whitespace comprising at 
least one whitespace sequence, wherein said whitespace 
transformation mle is designed to transform said at least one 
whitespace sequence into a common predetermined 
whitespace sequence. 

3. The method of claim 2, wherein said common prede- 
termined whitespace sequence comprises a single 
whitespace character. 

4. The method of claim 1, wherein prior to the step of 
transforming whitespace within the text-based file, a step of 
determining whether the computer file is indeed a text-based 
file is performed, said determining step comprising the steps 
of: 

examining a predetermined number of characters in the 
computer file; and 

determining whether a percentage of the examined char- 
acters that are printable characters exceeds a predeter- 
mined percentage. 

5. The method of claim 4, wherein said predetermined 
percentage is greater then or equal to 90 percent. 

6. The method of claim 4, wherein printable characters 
comprise ASCII character codes in the decimal range of 
8-13 and 32-126. 

7. The method of claim 4, wherein said predetermined 
number of characters is greater than or equal to 100. 

8. The method of claim 3, wherein said single whitespace 
character is a space character. 

9. The method of claim 8, wherein said whitespace 
sequence comprises at least one from the group consisting 
of: space, tab, vertical tab, line feed, form feed, carriage 
return, and null characters. 

10. The method of claim 1, said whitespace comprising at 
least one whitespace sequence, wherein said whitespace 
transformation mle is designed to transform said at least one 
whitespace sequence into a common predetermined non- 
whitespace sequence. 

U. The method of claim 10, wherein said at least one 
whitespace sequence comprises at least one from the group 
consisting of: space, tab, vertical lab, line feed, form feed, 
carriage return, and null. 
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12. A method of searching for a virus in a conaputer file 
that includes whitespace, the method comprising the steps 
of: 

storing at least one virus profile; 

determining whether the computer file is a text file; 

if the computer file is a text file, reformatting the contents 
of the computer file to convert a sequence of 
whitespace characters into a single code; and 

comparing the contents of the computer file with said at 
least one virus profile. 

13. The method of claim 12 wherein said at least one virus 
profile comprises a plurality of whitespace characters, said 
method further comprising the step of transforming succes- 
sive whitespace characters in said plurality of characters to 
a single code if the computer file is a text file. 

14. The method of claim- 12 wherein said single code is a 
space character. 

15. The method of claim 12 wherein said sequence of 
whitespace characters comprises at least one from the group 
consisting of: space, tab, vertical tab, hne feed, form feed, 
carriage return, and null. 

16. The method of claim 12 wherein whitespace charac- 
ters are non -printable computer- readable characters. 

17. The method of claim 12 wherein said determining step 
comprises the steps of: 

examining a predetermined number of characters in the 
file; and 

determining the percentage of the examined characters 
that are printable characters; 



30 



15 



20 



wherein said computer file is determined to be a text file 
if 90% or more of the predetermined number of char- 
acters are printable characters. 

18. The method of claim 17 wherein printable characters 
comprise ASCII character codes in the decimal range of 
8-13 and 32-126. 

19. A method of searching a computer file for a computer 
virus comprising the steps of: 

storing a virus profile comprising a sequence of computer- 
readable characters associated with a computer virus; 

determining whether the computer file is a text-based file; 

transforming whitespace within the computer file if the 
computer file is a text-based file; and 

searching said computer file for said virus profile. 

20. The method of claim 19 further comprising the step of 
transforming whitespace thin the vims profile if the virus is 
known to infect text-based files. 

21. The method of claim 19 wherein said transforming 
step comprises the steps of: 

identifying a sequence of at least one whitespace 
character, said sequence containing only non-printable, 
computer-readable characters; and 

replacing said sequence of at least one whitespace char- 
acter with a code. 

22. The method of claim 21 wherein the code is a single 
whitespace character. 

23. The method of claim 21 wherein the code is a single 
non-whitespace character. 
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